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^ ' Abstract 

Any physical channel of communication offers two potential reasons why its capacity (the number 
of bits it can transmit in a unit of time) might be unbounded: (1) (Uncountably) infinitely many choices 
of signal strength at any given instant of time, and (2) (Uncountably) infinitely many instances of time at 
which signals may be sent. However channel noise cancels out the potential unboundedness of the first 
aspect, leaving typical channels with only a finite capacity per instant of time. The latter source of infinity 
seems less extensively studied. A potential source of unreliability that might restrict the capacity also 
from the second aspect is "delay": Signals transmitted by the sender at a given point of time may not be 
received with a predictable delay at the receiving end. In this work we examine this source of uncertainty 
by considering a simple discrete model of delay errors. In our model the communicating parties get to 
Q ■ subdivide time as microscopically finely as they wish, but still have to cope with communication delays 

that are macroscopic and variable. The continuous process becomes the limit of our process as the 
time subdivision becomes infinitesimal. We taxonomize this class of communication channels based on 
^ ] whether the delays and noise are stochastic or adversarial; and based on how much information each 

lO ' aspect has about the other when introducing its errors. We analyze the limits of such channels and reach 

somewhat surprising conclusions: The capacity of a physical channel is finitely bounded only if at least 
one of the two sources of error (signal noise or delay noise) is adversarial. In particular the capacity is 
finitely bounded only if the delay is adversarial, or the noise is adversarial and acts with knowledge of 
the stochastic delay. If both error sources are stochastic, or if the noise is adversarial and independent of 
the stochastic delay, then the capacity of the associated physical channel is infinite! 
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d '. 1 Introduction 

It seems to be a folklore assumption that any physical medium of communication is constrained to commu- 
nicating a finite number of bits per unit of time. This assumption forms the foundations of both the theory of 
conmiunication [8 | as well as the theory of computing [10|. The assumption also seems well-founded give 
the theory of signal processing. In particular the work of Shamion i9J explains reasons why such a statement 
may be true. 

Any physical channel (a copper wire, an optical fiber, vaccuum etc.) in principle can be used by a sender 
to transmit a signal, i.e., a function / : [0, T] — )• [0, 1] for some time duration T. The receiver receives some 
function / : [0,T] — [0, 1], which tends to be a noisy, distorted version of the signal /. The goal of 



'Department of Computer and Information Science, University of Pennsylvania, Philadelphia PA. Email: 
san jeev@cis . upenn . edu. Supported in part by NSF Awards CCF-0635084 and IIS-0904314. 

^Microsoft Reseai'ch, Cambridge, MA 02142, and MIT, Cambridge, MA 02139. Email: madhu@mit . edu. 



1 



a communication system is to design encoders and decoders that communicate reliably over this channel. 
Specifically, one would like to find the largest integer kx such that there exist functions E : {0, 1}*^^ — >• {/ : 
[0, T] [0, 1]} and D : {f : [0, T] [0, 1]} ^ {0, l}'^^ such that Pr_^-|s(^) [D{f) / m] ^ where m is 

chosen uniformly from {0, l}'^^ and / is chosen by the channel given the input signal E{m). The capacity 
of the channel, normalized per unit of time, is the limsupj'^oo kT/T. 

In a typical such channel there are two possible source of "infinity". The signal value f{t), for any 
t G [0, T] is uncountably large and if the channel were not "noisy" this would lead to infinity capacity, even 
if time were discrete. But Shannon, in his works [8, points out that usually f{t) is not transmitted as 
is. Typical channels tend to add noise, typically a random function r/(t), which is modeled as a normally 
distributed random variable with mean zero and variance o"^, and independent across different instances of 
time t. He points out that after this noise's effect is taken into account, the channel capacity is reduced to a 
finite number (proportional to l/o"^) per instant of time. 

Still this leaves a second possible way the channel capacity could be infinite, namely due to the availabil- 
ity of infinitely many time slots. This aspect has been considered before in the signal processing literature, 
and the works of Nyquist ff\ and Hartley [4] (see the summary in [9]) once again point out that there is 
a finite limit. However the reason for this finite limit seems more axiomatic than physical. Specifically, 
these results come from the assumption that the signal / is a linear combination of a finite number of basis 
functions, where the basis functions are sinusoids with frequency that is an integral multiple of some min- 
imal frequency, and upper bounded by some maximum frequency. This restriction is then translated into a 
"discretization" result showing it suffices to sample the signal at certain discrete time intervals, reducing the 
problem thus to a finite one. 

In this work we attempt to explore the effects of "continuous time" more in the spirit of the obstacle 
raised in the context of the signal strength, namely that there is an obstacle also to assuming that time 
is preserved strictly accross the communication channel. We do so by introducing and studying a "delay 
channel" where signals transmitted by the sender arrive somewhat asynchronously at the receiver's end. We 
model and study this process as the limit of a discrete process. 

In our discrete model the sender/receiver get to discretize time as finely as they wish, but there is uncer- 
tainty/unreliability associated with the delay between when a signal is sent and when it is received. Thus 
in this sense, there is timing noise, that is similar in spirit to the signal noise. A signal that is sent at time 
t is received at time t + ■q{t) where r/(t) could be a random, or adversarial, amount of delay, but whose 
typical amount is a fixed constant (independent of the granularity of the discretization of time chosen by 
sender/receiver). Note that this could permute the bits in the sequence sent by the sender (or do more com- 
plex changes). We consider the effect of this delay on the channel capacity. For the sake of simplicity (and 
since this is anyway without loss of generality) we assume sender only sends a sequence of Os and Is. In 
addition to delays we also allow the channel to inject the usual noise. 

We discuss our model and results more carefully in Section but let us give a preview of the results 
here. It turns out that the question of when is the channel capacity finite is a function of several aspects of 
the model. Note there are two sources of error - the signal error, which we simply refer to as noise, and the 
timing error, which we refer to as delay. As either of these error sources could be probabilistic or adversarial, 
we get four possible channel models. Complicating things further is the dependence between the two - does 
either of the sources of error know about the error introduced by the other? Each setting ends up requiring 
a separate analysis. We taxonomize the many classes of channels that arise this way, and characterize the 
capacity of all the channels. The final conclusion is the following: If the delays are adversarial, or if the delay 
is stochastic and the noise is adversarial and acts with knowledge of the delay, then the channel capacity 
is finite (Theorem 12. Il l, else it is infinite (Theorem 12.21 ). In particular if both sources are adversarial then 
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the channel capacity is finite; and perhaps most surprisingly and possibly the most realistic setting, if both 
sources are probabilistic, then the channel capacity is infinite: finer discretization always leads to increased 
capacity. 

Organization: Section [2]formally describes our model and results. In Sections [3]and|4l we prove our results 
for finite and infinite channel capacity regimes respectively. Finally, we give some concluding thoughts in 
Section in 

2 Preliminaries, Model, and Results 

2.1 Continuous channels 

We start by describing the basic entities in a communication system and how performance is measured. 
Most of the definitions are "standard"; the only novelty here is that we allow sender/receiver to choose the 
"granularity" of time. We first start with the standard definitions. 

Channel(Generic) Given a fixed period of time T, a signal is a function / : [0, T] — )• E. We say the signal 
is bounded if its range is [0, 1]. A time T (bounded-input) channel is given by a (possibly non-deterministic, 
possibly probabilistic, or a combination) function channel-r '■ f ^ f whose inputs is a bounded signal 
/ : [0, T] [0, 1] and output is a signal / : [0, T] ^ M. 

A probabilistic channel is formally given by a transition probability distribution which gives the prob- 
ability of outputting / given input /. An adversarial channel is given by a set of possible functions / for 
each input /. We use / = channel-r (/) as shorthand for / drawn randomly from the distribution specified 
by channelT(/) in the case of probabilistic channels. For adversarial channels, we use the same notation 
/ = channelT(/) as shorthand for / chosen adversarially (so as to minimize successful communication) 
from channelr ( / ) ■ 

Channels can be composed naturally, leading to interesting mixes of adversarial and stochastic channels, 
which will lead to interesting scenarios in this work. 

Encoder/Decoder Given T and message space {0, 1}'^^ , a time T encoder is a function E : f where 
m G {0, 1}''^ and / : [0, T] [0, 1]. Given T and message space {0,1}''^, a timer decoder is a function 
D : f ^ m where / : [0, T] — )• M and m G {0, 1}*^^. (More generally, encoders, channels, and decoders 
should form composable functions.) 

Success Criteria, Rate and Capacity The decoding error probability of the system {Ex, Dt, channelr) 
is the quantity 

PrdecT = Pr^^{o,i}fcT .channel!"^ ^ Dt {chanuelT {Et {m)))] . 

We say that the communication system is reliable if IimT_^oo{Prdcc,T} = 0. 

The (asymptotic) rate of a communication system is the limit limsupy^o^^j/cr/r}. The capacity of a 
channel, denoted by Ca p, is defined to be the supremum of the rate of the communication system over all 
encoding/decoding schemes. 

2.2 Channel models 

We now move to definitions specific to our paper. We study continuous channels as a limit of discrete 
channels. To make the study simple, we restrict our attention to channels whose signal strength is already 
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discretized, and indeed we will even restrict to the case where the channel only transmits bits. The channel 
will be allowed to err, possibly probabilistically or adversarially, and e will denote the error parameter. 

We now move to the more interesting aspect, namely the treatment of time. Our model allows the 
sender and receiver to divide every unit of time into tiny subintervals, which we call micro-intervals, of 
length fi = 1/M (for some large integer M), and send arbitrary sequences of M bits per unit of time. 
This granularity is compensated for by the fact that the channel is allowed to introduce relatively large, 
random/adversarial, delays. However the channel is allowed to introduce uncertain delays into the system, 
where the delays average to some fixed constant A which is independent of //. Given that all aspects 
are scalable, we scale time so that A = 1. Again we distinguish between the adversarial case and the 
probabilistic case. In the adversarial case every transmitted symbol may be delayed by up to 1 unit of time 
(or by up to M microintervals). In the probabilistic case every transmitted symbol may be delayed by an 
amount which is a random variable distributed exponentially with mean 1. Finally, if multiple symbols end 
up arriving at the receiver at the same instant of time, we assume the receiver receives the sum of the value 
of the arriving symbols. 

We describe the above formally: 

Encoding: For every T, the sender encodes kx bits as MT bits by applying an encoding function Et ■ 

{0, 1}^^ {0, 1}*^^. The encoded sequence is denoted Xi, . . . , Xmt- 

Noise: The noise is given by a function ^ : [MT] — {0, 1}. The effect of the noise is denoted by the 
sequence Zi,. . . , Zmt, where Zj = Xj © ^(j). (We stress that Zj's are not necessarily "seen" by 
any physical entity — we just mention them since the notation is useful. Also, the © is merely a 
convenient notation and is not meant to suggest that the bits are elements of some finite field. We will 
be thinking of the bits as integers.) 

Delay: The delay is modeled by a delay function A : [MT] Z-^ where Z-° denotes the non-negative 
integers. 

Received Sequence The final sequence received by the receiver, on noise ^ and delay A, is the sequence 
Yi,..., Ymt, where Yi = J2j<i s.t. j+AU)=i = ® ^0')- 

Decoding The decoder is thus a function Dt : (Z^°)^^ {0, l}*^^. 

Note that while the notation suggests that the noise operates on the input first, and then the delay acts on 
it, we do not view this as an operational suggestion. Indeed the order in which these functions (^ and A) are 
chosen will be crucial to our results. 

Our channels are thus described as a composition of two channels, the noise-channel with parameter e, 
denoted N{e) and the delay-channel D. Since each of these can be probabilistic or adversarial, this gives 
us four options. Furthermore a subtle issue emerges which is: Which channel goes first? Specifically if 
exactly one of the channels is adversarial, then does it get to choose its noise/delay before or after knowing 
the randomness of the other channel. We allow both possibiUties which leads syntactically to eight possible 
channels (though only six of these are distinct). 

Notation: We use D to denote the delay channel and N{e) to denote the noise channel with parameter e. 
We use superscripts of ^ or P to denote adversarial or probabilistic errors respectively. We use the notation 
X|y to denote the channel X goes first and then Y acts (with knowledge of the effects of X). Thus the 
eight possible channels we consider are NP[DP, DP[NP, D'^[NP, N'^[DP, D^\N^, NP[D^, iV^|£>^, 
and D^|iV^. 
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2.3 Our results 



Given that the adversarial channels are more powerful than the corresponding random channels, and an 
adversary acting with more information is more powerful than one acting with less, some obvious bounds 
on the capacity of these channels follow: 

Cap(L>^|iV^) = Cap(iV^|L>^) < Cap(iV^|L>^) < Cap(D^|Af^) < Capp^|A^^), (1) 

and 

Cap(L»^|A^^) < Cap{DP\N^) < Cap(iV^|L>^) < Cap(iV-P|L>^) = Cap(D^|iV^). (2) 

The equalities above occur because if both channels are adversarial, or both are probabilistic, then ordering 
is unimportant. 

Our main results are summarized by the following two theorems. 

Theorem 2.1 (Finite Capacity Case) For every positive e, the capacity of the channels D'^\N{£)'^,DP\N[e)^, 
D^\N {e)P, and N{e)P\D^ are finite. That is, for every one of these channel types, and e > there exists a 
capacity C < oo such that for every fi, a ^-discretized encoder and decoder with rate R > C, there exists 
a 7 > such that the probability of decoding error Prdec ^ 7- 

Theorem 2.2 (Infinite Capacity Case) There exists a positive e such that the capacity of the channels 
DP'\N{e)P and N{e)^\DP are infinite. That is, for every one of these channel types, there exists an e > 
such that for every finite R, there exists a /x and a fi-discretized encoder and decoder achieving rate R, with 
decoding error probability Prjcc — t- 0. 

The theorems above completely characterize the case where the capacity is infinite. The theorems show 
that the capacity is infinite if either both channels are probabilistic (the most benign case) or if the noise is 
adversarial but acts without knowledge of the randomness of the probabilistic delay. On the other hand, the 
channel capacity is finite if the delay is adversarial, or if the noise is adversarial and acts with knowledge of 
the probabilistic delay. 

Relying on the "obvious" inequalities given earlier, it suffices to give two finiteness bounds and one 
"infiniteness" bound to get the theorems above, and we do so in the next two sections. Theorem 12. 1 [ follows 
immediately from Lemmas 13.21 and 1331 (when combined with Equations ^ and Theorem |2]2] follows 
immediately from Lemma |4T| (again using Equations ([Hi and 

3 Finite Capacity Regime 

In this section we prove that the capacity of our channels are finite, when the delay channel is adversarial 
(and acts without knowledge of the noise) or when the noise is adversarial and acts with knowledge of the 
delay. We consider the case of the adversarial noise first, and then analyze the case of the random errors 
adversarial delay first, and then consider the case of the adversarial noise. In both cases we use a simple 
scheme to show the capacity is limited. We show that with high probability, the channel can force the 
receiver to receive one of a limited number of signals. 

The following simple lemma is then used to lower bound the probability of error. 

Lemma 3.1 Consider a transmission scheme with the sender sending message from a set S with encoding 
scheme E, where the channel channel can select a set R of receiver signals such that 

PrmGS,channel[channel(£^(m)) ^ R]<t, 
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then the probability of decoding error is at least 1 — (r + liil/IS'l). 

Proof: Let R denote the space of all received signals and fix the decoding function D : R ^ S. Now 
consider the event that a transmitted message m is decoded correctly. We claim this event occurs only if 
only of the two events listed below occur: 

1. channel(£^(TO)) ^ R, which happens with probability at most r. 

2. m = D{r) for some r e R, which happens with probabiUty at most li^l/IS'l. 

If neither of the events listed above occur then the received signal r E R and D[r) ^ m implying the 
decoding is incorrect. The lemma follows immediately. □ 

3.1 Random Delay followed by Adversarial Noise {D^\N^) 

Here we consider the case where the delays are random, with expectation 1, and the noise is adversarial. In 
this section, it is useful to view the delay channel as a queueing system, under the noise channel's active 
control. To explain the queueing system, notice that exponential delays lead to a memoryless queue. At 
each microinterval of time, a packet enters the queue (the new bit sent by the sender). And then each packet 
in the queue chooses to depart, independent of other packets, with probability /x. (Note that the exponential 
delay/memorylessness renders the packets in the queue indistinguishable in terms of their arrival times.) 

For the noise channel also, we will adopt a sUghtly different view. In principle, it is capable of looking 
at the entire sequence of bits in the order in which they depart the queue, and then decide which ones to flip. 
However our adversary will be much milder. It will divide time into small intervals with the total number of 
intervals being N = 0(T/e). In each interval it will "hold" most arriving packets, releasing only those that 
are supposed to leave the queue. If packets are released during the interval, the noise adversary sets their 
value to 0. With the remaining packets it inserts them into the queue, with an integer multiple of eM/c of 
them being set to 1 (and flipping a few bits to in the process as needed) for some constant c = c{£). The 
remaining departures from the queue wiU then be transmitted untampered to the receiver. We will show that 
this departure process can be simulated by just the knowledge of the number of Is injected into the queue at 
the end of each interval, and the number of possibilities is just (c+ 1)^ = c(e)'^(^/'^)) which is independent 
of M. The adversary will be able to carry out its plan with probabihty 1 — exp(— T), giving us the final 
result. The following lemma and proof formalize this argument. 

Lemma 3.2 For every positive e, there exists a capacity C = C{e) such that the capacity of the channel 
D^\N{e)^ is bounded by C. Specifically, for every rate R > C, for every M (and /x = 1/M), every 
T, every kx > R ■ T, and every pair of encoding/decoding functions Et ■ {0, l}'^^ — > {0, 1}^^ and 
Dt : (Z^°)^^ {0, l}*^^, the decoding error probability Pr^ec = 1 - exp(-r). 

Proof: We start with a formal description of the channel action, and then proceed to analyze the probability 
of decoding error and channel capacity. 

Channel action: Let Xi,. . . , Xmt denote the MT bit string being sent be the sender. We will use 
Zj = Xj © ^(j) to denote the value of the jth bit after noise (even though the noise acts after the delay and 
so Zj may not be the jth bit received by the receiver). We let A : [MT] Z-° denote the delay function. 

Let e' = e/5 and L = e'M. The noise adversary partitions the MT microintervals into T/e' intervals 
of length L each, where the ith interval Fj = {(i — 1)L + 1, . . . , iL}. For every index i E {1, . . . ,T/e'}, 
the adversary acts as follows to set the noise function for packets from Fj: 
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1. Let Tii denote the Hamming weight of the string -'^(i_i)L+i • • • ^n, the weight of the arrivals in 
the queue in interval i. 

2. Let hi denote the rounding down of nj to an integer multiple of e' • L = (e')^ • M (we assume all 
these are integers). 

3. Let Ri denote the set of packets that arrive and leave in the ith interval, i.e., i?j = {j G Ti\j + A(j) S 

r,}. 

4. For every j £ Ri, the adversary sets Zj = (or ^(j) = Xj). Let yi = \Ri\ and let hi = min{nj, L — 
Vi}- 

5. The adversary flips the minimum number of packets from Fj \ Ri so that exactly hi of these are ones. 

If at any stage the adversary exceeds its quota of eMT errors it stops flipping any further bits. 

Error Analysis: We claim first that the probability that the adversary stops due to injecting too many errors 
is exponentially low. This is straightforward to bound. Notice that the number of bits of flipped in the ith 
interval due to early departures, is at most yi, and E[yi] < \e' L. The number of bits flipped for packets that 
wait in the queue (i.e., from Fj \ Ri) is at most maxjyj, rij — hi}. Again the expectation of this is bounded 
by the expectation of yi + (rij — hi) which is at most |(e')L. Thus adding up the two kinds of errors, we 
find the expected number of bits flipped in the ith interval is at most ^e'L. Summing over all intervals and 
applying Chemoff bounds, we find the probability that we flip more than (5e' = e)-fraction of the bits is 
exponentially small in T. 

Capacity Analysis: For the capacity analysis, we first note that the departure process from the delay queue 
(after the ^ function has been set) is completely independent of the encoding Xi, . . . , Xmt, conditioned on 
ni, . . . , n^^/g/ and on the event that the adversary does not exceed its noise bounds. Indeed for any fixing 
of the A function where the adversary does not exceed the noise bound, the output of D^\N^ channel on 
Xi , . . . , Xmt is the same as on the string Xi . . . Xmt, where for each i, the string . . . Xn is set 

to i"iO^~"'. Furthermore, note that the number of possible values of hi is at most l/e'. We thus conclude 
that with all but exponentially small probability, the number of distinct distributions received by the receiver 
(which overcounts the amount of information received by the receiver) is at most = (l/e)'^^^/^). 

An application of Lemma |3.1| now completes the proof. 

□ 

3.2 Adversarial Delay followed by Random Noise (D^ | A^^) 

Lemma 3.3 For every positive e < ^, there exists a capacity C = C{e) such that the capacity of the 
channel D'^\N{e)^ is bounded by C. Specifically, for every rate R > C, there exists a 7 > and Tq < 00 
such that for every M (and fi = 1/M), every T > Tq, and every pair of encoding/decoding functions 
Et ■■ {0, l}'^^ {0, 1}^^^ and Dt : {0, 1}*^^, the decoding error probability Prdec > 1 if 

kT> R-T. 

Proof Idea: We give the capacity upper bound in two steps. In the first step we create an adversarial delay 
function that attempts to get rid of most of the "detailed" information being sent over the channel. The effect 
of this delay function is that most of the information being carried by the channel in M microintervals can 
be reduced to one of a constant (depending on e) number of possibilities - assuming the errors act as they 
are expected to do. The resulting process reduces the information carrying capacity of the channel to that of 
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a classical-style (discrete, memoryless) channel, and we analyze the capacity of this channel in the second 
step. We give a few more details below to motivate the definition of this classical channel. 

We think of the delay function as a "queue", and the bits being communicated as "packets" arriv- 
ing/departing from this queue. We call a packet a 0-packet if it was a zero under the encoding and as a 
1-packet if it was a one under the encoding. Note that both types of packets, on release, get flipped with 
probability e and the receiver receives one integer per time step representing the total number of ones re- 
ceived. The delay adversary clusters time into many large intervals and holds on to all packets received 
during an interval, and releases most of them at the end of the interval. In particular if it releases uq 0- 
packets and ni 1-packets at the end of an interval, it makes sure that suq + (1 — e)ni takes on one of a 
"constant" number of values independent of M. (The actual value will be within it ^ of an integer multiple 
of M/c due to integrality issues, but in this discussion we pretend we get an exact multiple of M/c.) Note 
that the quantity eno + (1 — e)ni denotes the expected value of the signal received by the receiver when 
uq 0-packets and ni 1-packets are released, and so we refer to this quantity as the signature of the interval. 
If the errors were "deterministic" and flipped exactly the expected number of bits, then the channel would 
convey no information beyond the signature, and the total number of possible signatures over the course of 
all intervals would dictate the number of possible messages that could be distinguished from each other. 

However the errors are not "deterministic" (indeed — it is not even clear what that would mean!). They 
are simply Bernoulli flips of the bits being transmitted, and it turns out that different pairs (no , ni ) with the 
same signature can be distinguished by the receiver due to the fact that they have different variance. This 
forces us to quantify the information carrying capacity of this "signal-via-noise" channel. 

In the sequel, we first introduce this "signal-via-noise" channel (Definition 13.51) and bound its capacity 
(Lemmas l3.6l and r3.10l) . We then use this bound to give a proof of Lemma [331 

3.2.1 The "signal-via-noise" channel 

We introduce the "signal-via-noise" channel which is a discrete memoryless channel, whose novelty is in 
the fact that it attempts to convey information using the variance of the signal. We recall below some basic 
definitions from information theory which we will use to bound the capacity of this channel. (These can 
also be found in 121 Chapter 2].) 

Let X be a random variable taking values from some set X, and let denote the probability that X = x. 
Then the entropy of X, denoted H{X), is the quantity H{X) = J2xexP^ ^og{l/px)- 

Let X and Y be jointly distributed random variables with X taking values from X and Y from y. Let 
Px,y denote the probability that X = x and Y = y. For y £ y, let H{X\y) denote the entropy of X 
conditioned on y = y, that is, H{X\y) = 'Exex Px\y^og{l/p^iy) where = Px,y/{T.zexP^,y) denotes 
the probability that X = x conditioned onY = y. Then the conditional entropy of X given Y, denoted 
H{X\Y), is the quantity H{X\Y) = 'Ey^y[H(X\y)]. The mutual information between X and Y, denoted 
I{X; Y), is the quantity I{X; Y) = H{X) - H{X\Y). We will rely on the following basic fact. 

Proposition 3.4 ||2l Chapter 2] 

1. H{X, Y) = H{X) + H{Y\X) = H{Y) + H{X\Y). 

2. I{X;Y) = I{Y;X) = H{Y) - H{Y\X). 

A discrete channel C is given by a triple {X , y , V), where X denotes the finite set of input symbols, y 
denotes the finite set of output symbols, and P is a stochastic matrix with Vij denoting the probability that 
the channel outputs j ^y given i G as input. We use C{X) to denote the output of this channel on input 
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X. The information capacity of such a channel is defined to be the maximum, over all distributions V on 
Af, of the mutual information I{X; Y) between X drawn according to V and Y = C{X). 

The information capacity turns out to capture the operational capacity (or just capacity as introduced in 
SectionO of a channel when it is used many times (see Lemma [3. 101 below). Our first lemma analyzes the 
information capacity of the "signal-via-noise" channel, which we define formally below. 

In what follows, we fix positive integers M and c and a rational e. 

Definition 3.5 For integers M,c and e > 0, the collection of {M,e,c)-channels is given by {C^\fj, G 
{M/c, . . . , M}}, where the channel = {'V^,y^,V^) is given by 

= {{a,b) X I /i- ^ < ea + {1 - e)b < fi + ^,0 < a + b < M},yf, = {0, . . . , M}, 

and Vf_i is the distribution that, on input (a, b), outputs the random variable Y = X^iLi + Sj=i ^j' 
where the Ui 's and Vj 's are independent Bernoulli random variable with E[f/j] = e and E[V^] = 1 — e. 

Note that the expectation of the output of the channel is roughly fi, and the only "information carrying 
capacity" is derived from the fact that the distribution over {0, . . . , M} is different (and in particular has 
different variance) depending on the choice of (a, b) G X^. The following lemma shows that this information 
carrying capacity is nevertheless bounded as a function of e and c (independent of M). Later we follow this 
lemma with a standard one from information theory showing that the information capacity does bound the 
functional capacity of this channel. 

Lemma 3.6 For every < e < ^ and c < oo, there exists Cq = Co(c, e) such that for all M the information 
capacity of every (M, e, c)-channel is at most Cq. 

Proof: The lemma follows from the basic inequality for any pair of random variables X and Y that 
I{X;Y) = H{Y) — H{Y\X), where H{-) denotes the entropy function and H{-\-) denotes the condi- 
tional entropy function. Thus to upper bound the capacity it suffices to give a lower bound on H{Y\X) and 
an upper bound on H{Y). 

We prove below some rough bounds that suffice for us. Claim 1377] proves H{Y\X) > ^ log2 M — 
Ci(c, e) and Claim [3^ proves H{Y) < ^ log2 M + C2(c, e). It immediately follows that the capacity of 
the channel is at most Ci(c, e) + C2(c, e). We now proceed to prove Claims |3^ and 13.91 

Claim 3.7 There exists Ci(c, e) such that for every (a.b) G C^, H{Y\X = (a, b)) > ^ log2 M — Ci(c, e). 

Proof: This part follows immediately from the following claim which asserts that for every j G y^, Pr[Y = 
j\X = {a,b)] < 8(c/e)3/2jvf-i We thus conclude 

H(YIX = („. > min log > i log M - ^ log (^) - 3. 

Claim 3.8 For every j G y^, Pr[Y = j\X = (a, b)] < S{c/ef/^M~^. 

Proof: We use the Berry-Esseen theorem ||3] Chapter 16], and in particular the following version of the 
theorem. 
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If Zi, . . . , are random variables with mean zero such that X]i=i ^i^f] — ^^'^ Si=i -^[l-^d^] ^ P' 
then for every a G R, 



Pr 



< Q 



An immediate implication is that for a < /3, we have 



Pr 



a < 



< $(/3) - $(a) + 



In our setting, £ = a + b, and Zi = Ui — e for 1 < i < a and Zj = — (1 — e) for a + 1 < i < ^. We 
have = (a + b)e{l - e) > e{l - e)M/c > eM/(2c). Finally p can be crudely upper bounded by M 
(since i < M and — 1 < Zj < 1). Let fl = ea + (1 — e)b. Then, if we set a = /3 = (j — /u)/(t, we find that 
Pr[Eli^4=j]<8(c/e)3/2M-i □ 

□ 



Claim 3.9 There exists C2 such that for every random variable X supported on A"^ and Y = C^{X), we 
have H{Y) < \ \og^ M + C2. 

Proof: Let a be such that = Me(l — e) is an upper bound on the variance of Y conditioned on X. 
We use this upper bound to bound the probability of Y being too far from p, and in turn use this to bound 
its entropy. 

We partition into a sequence of sets 80,82, S^-i, 800 as defined below. 

5*0 = {j G y^, s.t. \j -fi\< 2a}, 

800 = {j G y^, s.t. \j - /i| > cj2}, 
and 8i = {j G y^ s.t. i ■ a < \j - iJ.\ < {i + 1) ■ a}, 

fori G {2, ... ,c7- 1}. 

Let ho,h2, ■ ■ ■ , /ifT-i and h^o denote the contribution of 6*0, 5*2, ... , 5*^-1 and 5*00 to the entropy of Y, 
i.e., hi = J2j&g^ Pr[Y = j] log (for i € {0, 2, . . . , a - 1, 00}). Similarly, let pi = Pr[Y € 8i]. 

Note that we have H{Y) = Hq + Yli=2 ^« + ^co bound these separately below, using rough 

approximations on pi. 

We start with a basic fact. For any set S, let Pr[y ^ S] < ps for some ps G (0, 1]. Then 
hs = Y, Pr[Y = j] log < PS log{\8\/ps). 

(Follows easily from the convexity of the entropy function and Jensen's inequality.) 
This immediately yields our first bound, using |5o| < 4(T and po < 1. We have 

ho < Po log(4cr/po) < po log o- + 2 (3) 
For the remaining parts we use the following Chernoff-like bound from ||6] Theorem 7.2. lJ3 

Pr[\Y-fi\>ea]<e-''"/^<2~', 
'Among the many such bounds available, this one allows variables to be non-identically distributed. 
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for 2 < £ < Thus pi < 2"* for i £ {2, ...,a - 1}, and poo < 2"'". 
For z G {2, o" — 1}, we can thus bound hi by 

hi < Pi log{a/pi) < Pi log a + i2~\ (4) 

and, bound hoo by 

/loo < (T • 2"'" = 0(exp(-\/M)) = 0(1) (5) 
Combining Equations [3j IH and [5] we get 

cr-l 

i/(y) = ho + J2f^i + ^oo 

i=2 

(T-1 



< pologfT + 2 + ^(pilog(T + i2-*) + 0(l) 



i=2 

CT— 1 OO 

< (po + J]p*)logfT + ^i2-' + 0(l) 

i=2 1=0 

< log(7 + 0(l) 

The claim now follows from the fact that a < \fM. □ 

□ 

Our analysis of the D^\'N^ channel immediately yields a lower bound on the "operational capacity" of 
any sequence of channels {C^,}^^- Standard bounds in information theory (see, for instance, [2, Chapter 
8, Theorem 8.7.1]) imply immediately that a bound on the capacity also implies that any attempt to com- 
municate at rate greater than capacity lead to error with positive probability. We summarize the resulting 
consequence below. (We note that while the theorem in [2 | only considers a single channel and not a collec- 
tion of channels, the proof goes through with only notational changes to cover a sequence of channels.) 

Lemma 3.10 Transmission at rate R greater than Cq, the information capacity, leads to error with positive 
probability. More precisely, for any < e < i^, let Cq = Co(e, c) be an upper bound on the information 
capacity of a collection of channels {C^|/i}. Then for every R > Cq, there exists a 70 > and Nq < 00 
such that for every N > Nq the following holds: For every sequence {C^-}^]^ of{M,e,c) channels, and 
every encoding and decoding pairs E : {0, 1}^^ ^ n^=i andD : {^,..., M}^ ^ {0, 1}^^, the 
probability of decoding error Pr^cc ^ To- 



3.2.2 Proof of Lemma |33] 



Proof: We now formally describe the delay adversary and analyze the channel capacity. Let c = 4/e and 
let Co = Co(e, c) be the bound on the capacity of (M, e, c) -channels from Lemma [3^ We prove the 
lemma for C(e, c) = 2(Co + log c). 

Delay: Let Xi, . . . , Xmt denote the encoded signal the sender sends. The noise channel picks ^(j) inde- 
pendently for each j with ^{j) being 1 w.p. e. We now describe the action of the delay channel (which acts 
without knowledge of 

We divide time into 2T intervals, with the ith interval denoted Ti = {{i- l)(M/2) + 1, . . . , i(M/2)}. 
Let ni(i) = X^^gp -^i ^'^^ ^o(^) = M/2 — ni{i) denote the number of 1-packets and 0-packets that arrive 
in the queue in the ith interval. The delay adversary acts as follows: 
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1. Initialize n[{l) = ni(l) and 71^(1) = no(l). 

2. For i = 1 to 2T do the following: 

(a) If ni{i) > no(i) then set no(i) = f^o(^) ^^'^ round n[{i) down to ni(i) so that (1 — e)ni(i) + 
eho{i) is within ^ of the nearest integer multiple of M/c. 

(b) Else let ni{i) = n[{i) and round riQ^i) down to no{i) so that (1 — e)ni(i) + eno{i) is within | 
of the nearest integer multiple of M/c. 

(c) Finally set nQ{i + 1) = no{i + 1) + no(i) — no(i). and n[{i + 1) = + 1) + ni{i) — ni{i). 

(d) At the end of interval i, output ho{i) 0-packets and hi{i) 1-packets from the queue to the noise 
adversary. Formally, the delay channel outputs a set A; of packets that are to be released at the 
end of interval Tj, where Aj includes all packets that arrived in Tj-i but were not included in 

Ai-i- 

(e) The noise adversary simply flips the bits according to the noise function and outputs the sum of 
these bits. Specifically it sets Zj = Xj + ^(j) and outputs Yi = J2jeA ^j- 



Analysis: We start by establishing that the delay adversary never delays any packet by more than M 
microintervals. Note that the number of packets that arrive in interval i, but are not released at the end of the 
interval is given by {n[{i) — ni{i)) + (nQ(i) — ho{i)). One of the two summands is zero by construction, 
and the other is at most M/ (ec) < M/4 by our construction. Since the total number of packets arriving in 
an interval is M/2, this ensures that the total number released in an interval is never more than 3M/4 < M 
(as required for an (M, e, /i)-channel). Next we note that packets delayed beyond their release interval do 
get released in the next interval. Again, suppose no(«) > ni{i). Then all 1-packets are released in interval 
i. And the number of 0-packets held back is at most || < M/4 which is less than no(i) the total number of 
0-packets arriving in interval Fj. Thus the adversary never delays any packet more than M microintervals, 
and the number of packets released in all intervals (except the final one) satisfy eho{i) + (1 — e)hi{i) in an 
integer multiple of M/c. 

For an encoded message Xi, . . . , Xmt, let Hi = [eno(i) + (l — e)ni(i)], where the notation [x] indicates 
the nearest integer to x, denote the signature of the ith interval; and let fl = (/ii, . . . ,/i2T) denote its 
signature. Note that /ij takes one of at most c distinct values (since it is between M/c and M and always an 
integer multiple of M/c). Thus the number of signatures is at most c^^. 

Now since the total number of distinct messages is 2''T, the average number of messages with a given 
signature sequence is at least 2'^'^/c^^. Furthermore, with probability at least 1 — 6, a random message is 
mapped to a signature sequence with at least 52^"^ /c^'^ preimages. Suppose that such an event happens. 
Then, using the fact that R > 2(Co + log c) — y log 6, we argue below that conditioned on this event the 
probability of correct decoding is at most 1 — 70 (where 70 > is the constant from Lemma [3. 101 ). This 
yields the lemma for 7 = (1 — 5)70. 

To see this, note that the signal li, . . . , Y2T received by the receiver is exactly the output of the channel 
sequence {C^^}JE^ on input Xi, . . . , X2T where = (1 — e)hi{i) + eno(i). If the receiver decodes the 
message (more precisely, its encoding) Xi, . . . , Xmt correctly from Yi, . . . , Y2T, then we can also compute 
the sequence Xi, . . . , X2T correctly (since the delay adversary is just a deterministic function of its input 
Xi,. . . ,Xmt)- Thus correct decoding of the D"^\N^{e) channel also leads to a correct decoding of the 
channel sequence {C^.}. But the number of distinct messages being transmitted to this channel is 62'^'^ /c^'^. 
Denoting this by 2^'^ -^ and using the fact that R > Cq, we get that the channel must err with probability at 
least 7. □ 
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4 Infinite Capacity Regime 



In this section we show that the capacity of the channel with adversarial noise followed by random delay 
{N^\D^) is infinite. Specifically, we establish the following result: 

Lemma 4.1 There exists a positive e, such that the capacity of the channel N{e)^\D^ is unbounded. Specif- 
ically, for every rate R, there exists a constant M (and fi = 1/M), such that for sufficiently large T, there 
exist encoding and decoding functions Et : {0, Ij^^^ {0,1}*^^ and Dt : {0,1}^^, the 

decoding error probability Prjec < exp(— T), with kx = R ■ T. 

Proof Idea: The main idea here is that the encoder encodes a by a series of Os followed by a series of Is 
and a 1 by a series of Is followed by a series of Os. Call such a pair of series a "block". If the noisy adversary 
doesn't corrupt too many symbols within such a block (and it can't afford to do so for most blocks), then 
the receiver can distinguish the two settings by seeing if the fraction of Is being received went up in the 
middle of the block and then went down, or the other way around. This works with good enough probability 
(provided the delay queue has not accumulated too many packets) to allow a standard error-correcting code 
to now be used by sender and receiver to enhance the reliability. 

Proof: We will prove the lemma for e < 1/64 below. I Let A; = A;^ = RT. We will set M = 0{R^). Let 
L = M^/^, and L' = M^/^, = {(i - 1)L + 1, . . . , iL}, and T'^ = {iL - L' + 1, . . . , iL}. As a building 
block for our sender-receiver protocol, we will use a pair of classical encoding and decoding algorithms, E' 
and D', that can handle up to 5/24-fraction of adversarial errors. (Note that 5/24 could be replaced with any 
constant less than 1/4.) In particular, for each message m G {0, 1}^, the algorithm E' outputs an encoding 
E'{m) of length N = Q{k) such that for any binary string s of length N that differs from E'{m) in at most 
{j^)N locations, D'{s) = m. We now describe our encoding and decoding protocols. 

Sender Protocol: The encoding E = Et works as follows. Let m G {0, 1}*^ be the message that the sender 
wishes to transmit. The encoding E{m) simply replaces every in E'{m) with the string 0^1^, and each 
1 in E'{m) with the string 1^0^. Thus E{m) is a string of length 2LN = MT. The sender transmits the 
string E{m) over the channel. 

Receiver Protocol: Recall that the receiver receives, at every microinterval of time t G [MT] the quantity 

= J2j<t\j+Aij)=t^j ® For an interval / C [MT], let Y{I) = Eje/^i- The decoding algorithm 
D = Dt, on input Yi, . . . , Ymt works as follows: 

1. For i = 1 to iV do: 

(a) Let ai = Y{T'^,^^)/L'. 

(b) If Y{T'2i) - Y{T'2,_^) < (-Qi + i) • then set Wi = 1, else set Wi = 0. 

2. Output Z)'(w). 

Analysis: By the error-correction properties of the pair E' , D', it suffices to show that for (19/24) -fraction 
of the indices i G [N], we have Wi = E'{m)i. 

Fix an z G [N] and let Qi denote the number of I's in the queue at the beginning of interval r2j_^. We 
enumerate a series of "bad events" for interval i and show that if none of them happen, then Wi = E'{m)i. 

^For clarity of exposition, we do not make any attempt to optimize the bound on the value of e. 
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Later we show that with probability (1 — exp(— T)) the number of bad I's is less than (5/24)A^, yielding the 
lemma. 

We start with the bad events: 

£i{i): Qi > cM (for appropriately chosen constant c). We refer i as heavy (or more specifically c-heavy) 
if this happens. 

£^2(0' The number of errors introduced by the adversary in the interval T2i is more than l6eL. We refer to 
i as corrupted if this happens. 

£z{i)'. i is not c-heavy but one of Y{T'2i_i) or Y{T'2i) deviates from its expectation by more than a;(Mi/2). 
We refer to i as deviant if this happens. 

In the absence of events £i, £2, £3, we first show that Wi = E'{m)i. Denote i to be a 1-block if 
E'{m)i = 1 and a 0-block otherwise. To see this, we first compute the expected values of Y(r2^_^), and 
y(r2j) conditioned on i being a block and i being a 1 block. (We will show that these expectations differ 
by roughly M^^/^", and this will overwhelm the deviations allowed for non-deviant i's.) 

We start with the following simple claim. 

Claim 4.2 Let £i,l2 be a pair of non-negative integers, and let £ denote the event that a packet p that is in 
the delay queue at some time t leaves the queue during the interval {t + ^1 + 1, . . . , t + ^1 + ^2}- Then 

V M ) \M M^J- V M M^)\MJ 

Thus ifii = and £2 < M, then %-0 ((ij)^) < Pr[£:] < ^. 
Proof: Note that 

Using the fact that for any non-negative integer £, 

M-[ m) - M^' 

we get the bounds in the claim. □ 
Let Qi = a ■ M. We now analyze the expectations of the relevant We analyze them under the 

conditions that a is bounded by the constant c (i.e. i is not heavy) and that i is not corrupt. 

E[y(r2j_^)]: The probability that a single packet leaves the queue in this interval is roughly L'/M + 
0{{L' /M)"^) (by Claim W2\ above). The expected number of packets that were in the queue at the 
beginning of T'2i_i that leave the queue in this interval is thus (d • M • L'/M) it 0{{L'Y /M). Any 
potential new packets that arrive during this phase contribute another 0{{L')'^ /M) potential packets, 
thus yielding E[y(r^._i)] = aV ± 0{VM) = aM^/^ ± 0{VM). 

E[y(r2j)] when i is a 1-block: Recall that a 1-block involves transmission of Is in r2j-i and Os in r2i. 
With the adversary corrupting up to WeL packets in r2t and the addition of L' new Is in the interval 
^2i-i^ at most L' + 16eL new ones may be added to the queue at the beginning of the interval Fgj. 
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Using Claim l4!2] with li = L and I2 = L' to the Qi packets from the beginning of interval r2j_^, and 
with ^1 = and I2 = L' to the new packets that may have been added, we get that 

Elnryi < (l-A + ^).(|:).5.M+(^).(16.L + L') 

E[y(r2j)] when i is a 0-block: In this case the T2i-i is all Os and T2i is all Is. So the number of Is seen 
in the T^^ should be more than the number of Is seen in the 0-block case. In this case, the number of 
new Is added to the queue in the intervals ^'21-1 and T2i — is lower bounded by L — L' — 16eL. 
Using Claim [4^2] again to account for the departures from Qi as well as the new arrivals in the interval 
r^i, we get 

+ (a-i6.,.-L')-(i-A).(iL_o(g)) 

= ahfil^ - (d - (1 - 16e)) • ^ o{-/M). 

Putting the above together we see that E[y(r2j_i) — i^(r2j)] has a leading term of aM^/^ in both cases 
(i being a 0-block or i being a 1-block), but the second order terms are different, and these are noticeably 
different. Now, if we take into account the fact that the event £'i{i) does not occur (i is not deviant), then we 
conclude that the deviations do not alter even the second order terms. We thereby conclude that if none of 
the events £i{i) or £2{i) or £z{i) occur, then Wi = E'{m)i. 

We now reason about the probabilities of the three events. The simplest to count is £2{i)- By a simple 
averaging argument, at most (l/8)th of all indices i can have i corrupt, since the total number of noise errors 
is bounded by e{2LN), and so the probability of £2{i) is zero on at least (7/8)th fraction of indices. £3(1) 
can be analyzed using standard tail inequalities. Conditioned on i being not c-heavy, each Y{-) is a sum 
of at most (cM + L + L') independent random variables (each indicating whether a given packet departs 
queue in the specified interval). The probability that this sum deviates from its expectation by 6t;(\/M) is 
0(1). Thus, the probability that £3(1) happens for more than a (l/24)th fraction of indices i, can again be 
bounded by exp(— T) by Chernoff bounds. 

The only remaining event is £i{i). Lemma 1431 below shows that we can pick c large enough to make 
sure the number of heavy i's is at most a (l/24)th fraction of all is, with probability at least 1 — exp(— T). 
We conclude that with probability at least 1 — exp(— T) the decoder decodes the message m correctly. □ 

Lemma 4.3 For every 5 > 0, there exists a c = c{6) such that the probability that more than 6-fraction of 
the indices i are c-heavy is at most e'^^^^)/"^. 

Proof: Recall that an interval i is c-heavy if Qi > cM. We will show that the lemma holds for c = 4/(5. 

For each packet j, recall that A(j) indicates the number of microintervals for which the packet j stays 
in the queue. Let W = J2j ^(j)- We will bound the probability that W is "too large" and then use this to 
conclude that the probability that too many intervals are heavy is small. 
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Note W is the sum of MT identical and independent geometric variables (namely the A(j)'s) with 
expectation of each being M. Thus the probability that W > K (for any K) is exactly the probability 
that K independent Bernoulli random variables with mean 1/M sum to less than MT. We can bound the 
probability of this using standard Chemoff bounds. Setting K = 2 ■ M^ ■ T, we thus get: 

Pr[W > 2M^T] ='Pr[W > 2E[W]] < exp 

It then suffices to show that conditioned on W < 2M^T, the fraction of c-heavy intervals (i.e., intervals 
where the queue contains more than (4M) / 6 packets) is bounded by S. 

In order to bound the number of c-heavy intervals using the bound on W, we first note that W = 
J2t=i ^t, where Nt denotes the number of packets in the queue at time t (counted in microintervals). 
Furthermore, since the number of packets in the queue can go up by at most one per microinterval, we see 
that heavy intervals contribute a lot to W. To make this argument precise, we partition time into chunks 
containing M/5 microintervals each (note that "chunks" are much larger than the "blocks"). We assume 
here that M/6 is an integer for notational simplicity. For I < i < ST, the chunk Ce spans the range 
[£{M/5), {£ + 1){M/S)). We say a chunk Q is badif the queue contains more than {3M)/d packets at the 
beginning of the chunk, and say that it is good otherwise. On the one hand, if a chunk is good, then every 
interval contained inside the chunk has at most (4M) / S packets in the queue, and is hence not c-heavy. On 
the other hand, if a chunk (7^ is bad, then its contribution to W (i.e., J2teCi< least {M / 5){2M / 5) 

(since this is the minimum of Nt for t G Q). This allows us to show that at most a 5-fraction of chunks can 
be bad. To see this, suppose 5^ is the fraction of bad chunks. Then we have 

MT 

W = ^Nt> Sb{ST){M/S){2M/S) = 2{Sb/S)M'^T. 

e=i 

Now using W < 2M^T, we get Si, < S. Finally note that if 1 — (5 fraction of the chunks are good, then 1 — S 
fraction of the blocks are not c-heavy, which completes the proof of the lemma. □ 

5 Conclusions 

Our findings, in particular the result that the channel capacity is unbounded in the setting of probabilistic 
error and delay, are surprising. They seem to run contrary to most traditional intuition about communication: 
all attempts at reliable communication, either in the formal theory of Shannon, or in the organic processes 
that led to the development of natural languages, are built on a discrete communication model (with finite 
alphabet and discrete time), even when implemented on physical (continuous time and alphabet) commu- 
nication channels. In turn such assumptions also form the basis for our model of computing (the Turing 
model) and the discrete setting is crucial to its universality. In view of the central role played by the choice 
of finite alphabet in language and computation, it does make sense to ask how much of this is imposed by 
nature (and the unreliability/uncertainty it introduces) and how much due to the convenience/utility of the 
model. 

Of course, our results only talk about the capacity of a certain mathematical model of communication, 
and don't necessarily translate into the physical world. The standard assumption has been that a fixed 
communication channel, say a fixed copper wire, has an associated finite limit on its ability to transmit bits 
(rehably). We discuss below some of the potential reasons why this assumption may hold and how that 
contrasts with our results: 
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Finite Universe One standard working assumption in physics is that everything in the universe is finite and 
discrete and the continuous modeling is just a mathematical abstraction. While this may well be true, 
this points to much (enormously) larger communication capacities for the simple copper wire under 
consideration than the limits we have gotten to. Indeed in this case, infinity would be a pretty good 
abstraction also to the number of particles in the universe, and thus of the channel capacity. We note 
here that channel capacity has been studied from a purely physics perspective and known results give 
bounds on the communication rate achievable in terms of physical limits imposed by channel cross 
section, available power, Planck constant, and speed of light (see, for example, HI El). 

Expensive Measurements A second source of finiteness might be that precise measurements are expensive, 
and so increasing the capacity does come at increased cost. Again, this may well be so, but even if 
true suggests that we could stay with existing trans-oceanic cables and keep enhancing their capacity 
by just putting better signaling/receiving instruments at the two endpoints - a somewhat different 
assumption than standard ones that would suggest the wires have to be replaced to increase capacity. 

Band-limited Communication A third possibility could be that signalling is inherently restricted to trans- 
mitting from the linear span of a discrete and bounded number of basis functions. As a physical 
assumption on nature, this seems somewhat more complex than the assumption of probabilistic nois- 
iness, and, we believe, deserves further explanation/exploration. 

Adversaries Everywhere Finally, there is always the possibility that the probabilistic modelling is too 
weak to model even nature and we should really consider the finite limits obtained in the adversarial 
setting as the correct limits. Despite our worst-case upbringing, this does seem a somewhat paranoid 
view of nature. Is there really an adversary sitting in every piece of copper wire? 
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